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ABSTRACT 

The SET- and MYND-domain containing (Smyd) 
proteins constitute a special subfamily of the 
SET-containing lysine methyltransferases. Here we 
present the structure of full-length human SmydS in 
complex with S-adenosyl-L-homocysteine at 2.8 A 
resolution. SmydS affords the first example that 
other region(s) besides the SET domain and its 
flanking regions participate in the formation of the 
active site. Structural analysis shows that the previ- 
ously uncharacterized C-terminal domain of SmydS 
contains a tetratrico-peptide repeat (TPR) domain 
which together with the SET and post-SET 
domains forms a deep, narrow substrate binding 
pocket. Our data demonstrate the important roles 
of both TPR and post-SET domains in the histone 
lysine methyltransferase (HKMT) activity of SmydS, 
and show that the hydroxyl group of Tyr2S9 is 
critical for the enzymatic activity. The characteristic 
MYND domain is located nearby to the substrate 
binding pocket and exhibits a largely positively 
charged surface. Further biochemical assays show 
that DNA binding of SmydS can stimulate its HKMT 
activity and the process may be mediated via the 
MYND domain through direct DNA binding. 

INTRODUCTION 

It has been well established that covalent modifications of 
histones are involved in the regulation of chromatin struc- 
ture and function and hence play critical roles during 
development and disease pathogenesis (1). Histone 



methylation, one of the major forms of histone modifica- 
tion, has been shown to exert important functions in 
various biological processes such as heterochromatin 
formation, X-chromosome inactivation and transcription- 
al regulation (2). Methylation of histones can occur at 
different lysine or arginine residues and is correspondingly 
catalyzed by various methyltransferases. Since about 
a decade ago when it was discovered that the SET 
domain-containing proteins are able to selectively methy- 
late lysine residues of histones (3,4), the SET family 
histone lysine methyltransferases (HKMTs) have been 
found to be responsible for methylation of all (H3K4, 
H3K9, H3K27, H3K36 and H4K20) but one lysine 
residue (H3K79) of histones (5). 

In contrast to the vast majority of the SET-containing 
lysine methyltransferase, five proteins share a special 
characteristic with the SET domain being spht by 
a myeloid-Nervy-DEAF-1 (MYND) insertion, and thus 
cluster to a subfamily named SET- and MYND-domain 
containing proteins (Smyd) (6,7). In the Smyd subfamily, 
three proteins have been proven to possess methyl- 
transferase activities: Smydl is able to specifically methy- 
late histone H3-Lys4 (H3K4) (8); Smyd2 harbors a 
methyltransferase activity towards histone H3-Lys36 
(H3K36) (7,9), p53-Lys370 (10), and probably H3K4 as 
well (9); and Smyd3 specifically catalyzes di- and tri- 
methylation of H3K4 (6) and methylation of Lys831 
of vascular endothehal growth factor receptor I 
(VEGFRl) (II). 

Consistent with the HKMT function of Smyd3 for 
methylation of H3K4 which is a hallmark of active gene 
transcription (12-14), Smyd3 has been demonstrated to 
bind with RNA helicase HELZ and hence is associated 
with RNA polymerase II for transcription elongation 
(6). The other substrate of Smyd3, VEGFRl, plays an 
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important role in regulation of angiogenesis and has been 
shown to be involved in inflammatory responses, tumor 
growth and atherosclerosis (15). Given the functional roles 
of Smyd3 in gene transcription and the VEGFRl 
signaUng pathway, it is not surprising that dysregulation 
of Smyd3 is involved in disease pathogenesis. Smyd3 
was originally identified as a gene overexpressed in 
hepatocellular carcinoma (HCC) and colorectal carcin- 
oma (CRC) cells (6). Upregulation of Smyd3 promoted 
the growth of HCC and CRC cells (6), and the presence 
of three tandem repeats of an E2F-1 -binding element in 
the Smyd3 promoter region is a risk factor for HCC, CRC 
and breast cancer (16). 

Despite the important functional roles of Smyd3 and its 
association with cancers, the knowledge about the molecu- 
lar mechanism of its methyltransferase activity and the 
regulation of the enzymatic activity is quite limited. 
It has been suggested that the MYND domain might 
bind specific DNA elements, and a chaperon, namely 
heatshock protein 90 a (Hsp90a) has been demonstrated 
to interact with Smyd3 and enhance its HKMT activity in 
a dose-dependent manner (6). However, the potential 
function and structural basis of the DNA binding by the 
MYND domain of Smyd3, and the mechanism underlying 
the regulation of the HKMT activity of Smyd3 by HspQOa 
remain unclear. In addition, more fundamental questions 
about Smyd3 and the other Smyd proteins are in queue to 
be answered. For example, it has been well known that the 
'pre-SET' and 'post-SET' domains are necessary for the 
activity of SUV39H1 (3); however, for the Smyd subfam- 
ily members, there is no 'pre-SET' domain, and the exact 
function of the 'post-SET' domain has not been 
investigated yet. In addition, all Smyd proteins except 
SmydS have a large C-terminal region, but the structure 
and function of the region are still unknown. 

We carried out structural and functional studies 
of human Smyd3, and present here the crystal structure 
of Smyd3 in complex with the cofactor product, 
S-adenosyl-L-homocysteine (AdoHcy) at 2.8 A resolution. 
Comparison of the Smyd3-AdoHcy complex with the 
reported structures of other SET enzymes (17-29) 
reveals unique structural characteristics of Smyd3. Based 
on the structural information, we further performed 
mutagenesis analyses and biochemical assays. Together, 
our results reveal the unique features of Smyd3 
compared with the other SET methyltransferases, and 
provide insights into the structural basis and regulatory 
mechanism of the HKMT activity of Smyd3. 

MATERIALS AND METHODS 

Protein expression and purification 

The full-length human Smyd3 gene was amplified with 
PCR from HEK293 cDNA and was subcloned into the 
pET-28a and pET-28b vectors (Novagen) with a Hisg tag 
at the N -terminus or the C-terminus. The constructed 
plasmids were transformed into Escherichia coli BL21 
(DE3) Codon Plus strain. The bacterial cells were grown 
in LB medium at 37°C to ODgoo of 0.6, and induced with 
0.2 mM isopropyl-P-D-thiogalactopyranoside at 16°C for 



24 h. The cells were collected by centrifugation at 6000 g, 
suspended in a lysis buffer [50 mM Tris-HCl (pH 8.0), 
300 mM NaCl, 5mM P-mercaptoethanol, 10% glycerol 
and 1 mM phenylmethylsulfonyl fluoride], and lysed on 
ice by sonication. The cell lysate was precipitated by cen- 
trifugation at 18 000 for 30min, and the supernatant was 
used for protein purification. 

The human Smyd3 protein was purified by affinity chro- 
matography using a Ni^^-NTA column (Qiagen) 
equihbrated with a binding buffer [20 niM Tris-HCl 
(pH 8.0), 300 mM NaCl and 5mM P-mercaptoethanol]. 
The column was washed with the binding buffer supple- 
mented with 30 niM imidazole, and then the target protein 
was eluted with the binding buffer supplemented with 
200 niM imidazole. The protein sample was further 
purified by gel filtration using Superdex 200 16/60 
column (Amersham Biosciences). Finally, half of the 
protein was stored in storage buffer A containing 20 mM 
Tris-HCl (pH 8.0), 100 mM NaCl and 1 mM DTT, while 
the rest in buffer B containing 20 mM Tris-HCl (pH 8.0), 
50 mM Li2S04 and 1 mM DTT. 

The Smyd3 mutants were generated using the 
QuikChange site-directed mutagenesis kit (Stratagene) 
and verified by sequencing. Expression and purification 
of the mutant proteins were performed following the 
same procedure as for the wild-type protein. 

Crystallization, diffraction data collection and structure 
determination 

The purified C-terminally tagged Smyd3 protein was 
concentrated to 5-lOmg/ml in storage buffers (A and B) 
and then incubated with 600 ^iM AdoHcy. Crystallization 
was performed using the hanging drop vapor diffusion 
method. Crystals belonging to space group P2i (form I) 
were obtained at 4°C with an equal volume of the protein 
in storage buffer A and the reservoir solution containing 
80 mM sodium cacodylate (pH 6.5), 160mM calcium 
acetate, 14.4% polyethylene glycol 8000 and 20% 
glycerol. Crystals belonging to space group P2;2j22 
(form II) were obtained with the protein in storage 
buffer B and the reservoir solution containing 2.8 M 
sodium acetate (pH 7.0). 

The purified N-terminally tagged Smyd3 protein was 
concentrated to 7.5-15mg/ml in storage buffer A, and 
then incubated with 600 |iM AdoHcy. Crystals belonging 
to space group P6i (form III) were obtained using the 
sitting drop vapor diffusion method with the reservoir 
solution containing 0.1 M bicine (pH 9.0), 0.1 M NaCl 
and 20% polyethylene glycol monomethyl ether 550. 

The diffraction data were collected from flash-cooled 
crystals at 100 K at beamhne 17U of Shanghai 
Synchrotron Radiation Facility, China. The diffraction 
data were processed, integrated and scaled together with 
HKL2000 (30). The statistics of the diffraction data are 
summarized in Table 1. The structure of Smyd3 in 
complex with AdoHcy was solved by the molecular 
replacement method using CNS (31) with the structure 
of Sniyd3 in complex with 5-adenosylmethionine 
(AdoMet) as the search model (PDB code 3MEK). The 
model building was performed using Coot (32), and the 
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Table 1. Summary of diffraction data and structure refinement statistics 



Form I 



Form II 



Form III 



Diffraction data^ 
Wavelength (A) 
Space group 
Cell parameters 

a, b, c (A) 

a, P, Y (°). 
Resolution (A) 
Observed reflections 
Unique reflections (//cr(/) > 0) 
Average redundancy 
Average Ija(I) 
Completeness (%) 

Emerge (%) 

Refinement and structure model 
Reflections (F„ > Oa{Fa)) 

Working set 

Test set 
R factor/free R factor (%f 
Number of non-H atoms 
Number of amino acid residues 
Number of water rnolecules 
Average B factor (A~) 

All atoms 

Protein 

Ligand/ion 

Water 
RMSD 

Bond lengths (A) 

Bond angles (°) 
Ramachandran plot (%) 

Most favoured regions 

Allowed regions 

Generously allowed regions 



0.99985 
P2i 

58.0, 118.0, 82.8 
90.0, 91.8, 90.0 
50.0-2.8 (2.90-2.80f 
86 956 
26 338 
3.3 (3.0) 
9.8 (2.9) 
97.9 (94.4) 
12.2 (33.9) 



24970 
1323 

21.1/26.1 
6896 
851 
24 

40.3 
40.3 

39.8/39.4 
24.6 

0.007 
1.1 

92.1 

7.7 

0.3 



0.97908 
P2,2,2i 

55.0, 101.0, 117.3 

90, 90, 90 

50-3.6 (3.73-3.60)" 

29147 

6662 

4.4 (3.0) 

10.1 (2.1) 

82.8 (68.4) 

12.6 (33.3) 



6299 

340 

24.1/2 

3443 

426 



103.4 
103.4 

108.1/104.4 



0.009 
1.2 

83.6 
15.9 
0.5 



0.97908 
P6i 

103.4, 103.4, 112.2 

90, 90, 120 

50-3.4 (3.52-3.40)" 

69 514 

9395 

7.4 (5.9) 

16.6 (2.5) 

99.9 (100) 

13.0 (64.9) 



487 

22.4/24.9 

3426 

424 



97.4 
97.8 

54.7/58.6 



0.008 
1.1 



86.2 
12.5 
1.3 



Numbers in parentheses represent the highest resolution shell. 

"Emerge = E/,A/E,- < /(/'«) > 1/ Em/ E/ /.(/'«)■ 



Structure refinement was carried out using CNS (31) and 
REFMAC5 (33). The stereochemical geometry of the 
structures was analyzed using Procheck (34). The figures 
were generated using Pymol (http://www.pyiTiol.org). The 
statistics of the structure refinement and the quahty of the 
final structure models are also suinmarized in Table 1 . The 
structural model derived from form 1 crystals has the 
highest resolution and best quality. For this crystal 
form, there are two molecules in an asymmetric unit, 
and the NCS restraint was applied during the initial re- 
finement but released in the later stage of refinement. The 
two molecules assume almost identical overall structures 
[superposition of all Ca atoms yields a root-mean-square 
deviation (RMSD) of 0.27 A], and the molecule with more 
detectable residues and better electron density was chosen 
for structural analyses and discussion in this article. 

Gel shift assay 

Different amounts of the wild-type and mutant Smyd3 
proteins were incubated with 1 |ig 6 bp DNA (5'-CCCTC 
C-3') in a buffer containing 50mM Tris-HCl (pH 7.4), 
100 mM NaCl and ImM DTT. After incubation at 30°C 



for 2h, the samples were loaded to a 2% agarose gel and 
visualized under UV with Gel Green staining. 

In vitro HKMT activity assay 

For the HKMT activity assay, 10 |.ig each of the wild-type 
and mutant Smyd3 proteins were incubated with 40|.tg 
histone mixture extracted from calf thymus (Sigma) 
along with 0.5 (iCi [methyl-^^H]-5-adenosylmethionine 
(PerkinElmer Life Sciences) as the methyl donor for 2h 
at 30° C, in a total volume of 40 ^1 with a buffer containing 
50 mM Tris-HCl, 100 niM NaCl and ImM DTT. The 
HKMT activity was analyzed by hquid scintillation 
counting. As a negative control, the background reading 
was measured from the assay system containing the 
wild-type Smyd3 and the labeled cofactor but no histone 
mixture, which was subtracted from the readings of the 
other assay experiments. For some Smyd3 inutants, the 
experimental readings were slightly lower than the back- 
ground reading, resulting in the apparent 'negative 
activities' after subtraction of the background reading. 
This also indicates that these mutants had no detectable 
enzymatic activity towards H3K4. The HKMT activity of 
Smyd3 increases substantially with the increase in pH 
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(Supplementary Figure SI). For analyses of the effect of 
mutations of residues at the active site, the assays were 
performed at pH 8.0. To examine the stimulatory effect 
by DNA, the HKMT activities of the wild-type Smyd3 
and the R66E mutant were measured at the physiological 
pH 7.4 with and without the 6 bp potential target DNA 
(5'-CCCTCC-3'). 



RESULTS AND DISCUSSION 

Overall structure 

Crystallization of the full-length human Smyd3 in complex 
with the cofactor product AdoHcy was carried out, and 
three different forms of crystals belonging to three different 
space groups {P2x, P2x2\2\ and _P6i) have been obtained 
(Table 1). The structures derived from the three forms of 
crystals are all similar with AdoHcy and three Zn^^ ions 
bound at similar positions, and the one (form I) refined to 
the highest resolution (2.8 A) was used for further struc- 
tural analysis and discussion. As shown in Figure lA and 
B, the SmydB-AdoHcy complex assumes a compact 
globular structure. The N-terminal region including the 
SET domain (residues 1^3 and 94-242), the MYND 
domain (residues 44-93) and the post-SET domain 
(residues 243-270) has a mixed structure consisting of a 
helices (al-a7), P-strands (pi-(312) and long extended 
loops, while the C-terminal region (residues 271-428) is 
comprised of mainly a helices (a8-al5) (Figure lA and 
Supplementary Figure S2). Our Smyd3-AdoHcy structure 
is very similar to the Smyd3 structure in complex with 
AdoMet (PDB code 3MEK) which was derived from a 
crystal of space group P2i2i2i with an RMSD of 0.9 A 
for all Ca atoms. In addition, the structures of the active 
site are almost identical in the two complexes with AdoHcy 
and AdoMet bound in a similar mode. 

The overall architecture of the SET domain of Smyd3 is 
similar to the other SET domains of HKMTs. The core 
region (residues 1^0 and 180-242) of the SET domain is 
comprised largely of three canonical P sheets (pi and p2; 
P3, piO and pil: and P4, P7 and P8) and extended loops. 
Structural comparison of the SET domain of Smyd3 
(Smyd3-SET) with those of the other SET methyl- 
transferases (17-21,23-29,35-37) especially the H3K4- 
specific enzymes SET7/9 (18,21,23,25) and MLLl (29) 
shows that the SET domain of Smyd3 mostly resembles 
that of SET7/9 with an RMSD of 2.2 A based on super- 
position of the core region of the SET domain (Figure IB). 
In addition, in a surface pocket formed by the SET and 
post-SET domains, the cofactor product AdoHcy binds in 
the same position and configuration as those in the SET7/ 
9-AdoHcy complex (Figure IB), and the overall structure 
of the cofactor binding site is generally similar to that of 
SET7/9 (see details later). However, unlike SET7/9, the 
post-SET domain of Smyd3 is Cysteine-rich and a Zr^^ 
ion is bound and coordinated by residue Cys208 of the 
SET domain, and residues Cys261, Cys263 and Cys266 of 
the post-SET domain (Figure IC, left panel). Comparison 
of Smyd3 with the other methyltransferases containing a 
Cysteine-rich post-SET domain such as MLLl (29), Dim5 
(22,38) and Suv39H2 (17) shows that the position and 



coordination of the bound Zn in these enzymes are 
similar; however, the overall structures of the post-SET 
domains are quite different: the post-SET domain of 
Smyd3 is comprised of an N-terminal a helix (a6), a 
loop and a C-terminal a helix (a7), whereas that of the 
other enzymes mainly forms a long loop (Figure IC, right 
panel). Intriguingly, although the post-SET region 
of Smyd3 shares little sequence similarity with that of 
SET7/9, the N-terminal part of the post-SET region of 
Smyd3 resembles that of SET7/9 by forming an a helix 
(a6) (Figure IC, left panel). These results demonstrate that 
Smyd3 has a distinctive Zn^^-binding post-SET motif 

The characteristic MYND domain forms a relatively 
independent structure, comprised of mainly a long 
extended loop, a short p sheet (P5 and p6) and two a 
helices (al and a2) (Figure ID). The MYND domain is 
characterized by a CeHC zinc chelating motif, and as 
expected, the MYND domain in our structure is bound 
with two Zn^^ ions (Figure ID). 

Tetratrico-peptide repeat domain 

Search with the Dali server (http://ekhidna.biocenter 
.helsinki.fi/dali_server) reveals that the C-terminal region 
is mainly comprised of a tetratrico-peptide repeat (TPR) 
domain (residues 280^28, a9-al5 and ri5) consisting of 
three TPR motifs which are degenerate 34 amino acid se- 
quences and assume helix-turn-helix structures 
(Supplementary Figure S3). TPR repeats have been 
found in many proteins to mediate protein-protein and 
sometimes, protein-Hpid interactions. It has been 
reported that Hsp90a binds to Smyd3 to enhance its 
methyltransferase activity (6), and intriguingly, 
numerous TPR domain-containing proteins, including 
Hop, CyP40, FKBP51/52 and p23, have been shown to 
bind to Hsp90 (39^2). In the crystal structure of the 
C-terminal TPR2 domain of Hop in complex with a 
C-terminal pentapeptide (MEEVD) of Hsp90, the 
peptide binds to a helical groove and is anchored to the 
TPR domain of Hop mainly through interactions of a 
highly conserved two-carboxylate clamp of the peptide 
with five residues of TPR2 which are also conserved in 
the TPRl domain of Hop (43). Although the overall con- 
figuration of the TPR domain of Smyd3 is similar to 
TPR2 of Hop (RMSD of 4.9 A based on 128 Ca atoms), 
the C-terminal region of the second TPR motif (al2) 
forms a long distorted a hehx and hence the a helices at 
the C-terminus are almost perpendicular to their corres- 
ponding regions in the structure of the Hop-Hsp90 
peptide complex. The molecular mechanism underlying 
the Smyd3-Hsp90a interaction is under investigation. 

Histone binding pocket 

Our attempts to obtain a structure of Smyd3 in complex 
with the histone peptide were unsuccessful. The Smyd3- 
AdoHcy structure was superposed to the structure of 
SET7/9 in complex with a methylated histone peptide 
(PDB code 109S) (25) (Figures IB and 2A). In SET7/9, 
a narrow lysine channel is detected connecting the 
cofactor binding site and the histone peptide binding site 
(25). In the Smyd3-AdoHcy complex, a similar channel is 
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Figure 1. Structure of the Smyd3-AdoHcy complex. (A) Overall structure of the Smyd3-AdoHcy complex. Top: a schematic representation of the 
full-length Smyd3 with the N-terminal SET domain (residues 1-43 and 94-242), the MYND domain (residues 44-93), the post-SET domain (residues 
243-270), and the C-terminal region (residues 271-428) colored in magenta, yellow, cyan and blue, respectively. Bottom: two views of the overall 
structure of the Smyd3-AdoHcy complex. The domains are colored accordingly and the secondary structure elements are marked. The cofactor 
product AdoHcy is shown with a ball-and-stick model and colored in cyan. (B) Structural comparison of the SET and post-SET doinains of Smyd3 
with the equivalent regions of SET7/9 (PDB code 109S). Superposition of the Smyd3 and Set7/9 structures was performed based on the core region 
of the SET domain. SET7/9 is colored in green, and the color coding for Smyd3 is the same as in Figure lA. The cofactors are shown with 
ball-and-stick models and colored accordingly. (C) Comparison of the Zn^^-binding site in the catalytic core of Smyd3 with the equivalent regions of 
SET7/9 (left panel) and Dim 5 (PDB code IPEG, right panel). The post-SET regions of Smyd3, SET7/9 and Dim5 are shown with ribbon 
representations and colored in cyan, green and wheat, respectively. The side chains of the involved Cys residues and the bound cofactors are 
shown with ball-and-stick models and colored accordingly. The Zn^^ ions are shown with sphere models and colored accordingly. The secondary 
structure elements and the involved Cys residues in Smyd3 are labeled. (D) Zinc-binding sites in the MYND domain. The MYND domain (yellow) is 
characterized by a CjHC zinc chelating motif The side chains of the Cys and His residues chelating the two Zn^^ ions are shown with ball-and-stick 
models. The Zn^^ ions are shown with sphere models. The secondary structure elements and the involved residues are labeled. 
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Figure 2. Histone binding pocl^et. (A) Potential liistone peptide binding site. Superposition of the Smyd3 and SET7/9 (PDB code 109S) structures 
was performed as in Figure IB. Smyd3 is shown in a ribbon representation with the same color coding as in Figure lA. For simplicity, for SET7/9 
only the bound histone peptide is shown in a ribbon representation and colored in green and the side chain of the methyllysine of the histone peptide 
is shown with a ball-and-stick model and colored accordingly. The cofactor is also shown with a ball-and-stick model. (B) Electrostatic potential 
surface of the potential histone peptide binding pocket in Smyd3. The surface charge distribution is displayed as blue for positive, red for negative, 
and white for neutral. A close-up view of the pocket (middle panel) shows that several acidic patches are present at the opening of the binding 
pocket. Some of the acidic residues in these patches are labeled. The TPR domain forms part of the substrate binding pocket and removal of the 
TPR domain would leave an incomplete pocket (right panel). (C) HKMT activity assays of the wild-type Smyd3 and the mutants with truncation or 
mutations at the substrate binding pocket. The activities of the truncate with deletion of the C-terminal region (A277^28) and the mutants carrying 
one or two point mutations of the residues potentially involved in histone binding were determined. Activity is shown as the relative activity of the 
proteins normalized to that of the wild-type protein. The experiments were performed in triplicates and the error bars indicate the standard deviation. 



present (see details later) and at one end of the channel 
AdoHcy binds at an almost identical position as in the 
SET7/9 complex (Figure IB). Thus, we reason that the 
histone substrate of Smyd3 should bind at the other end 
of the channel as observed in SET7/9. In the SET7/9 struc- 
tures, the histone peptide binding site is quite open as the 
N-terminal pre-SET domain is distant from the active site 
(21,25). In the Smyd3 structure, however, the TPR 
domain encloses a large part of the substrate binding 
site, and together with the SET and post-SET domains 
forms a deep and narrow substrate binding pocket 
(Figure 2A and B). Detailed examination of the pocket 
shows that several acidic residues, including Glul92 and 
Asp241 of the SET domain and Asp332 of the TPR 
domain are located at the opening of the substrate 
binding pocket, and might be involved in histone 
binding (Figure 2B). 

The mutagenesis analyses clearly show that truncation 
of the TPR domain (A277^28) resulted in an approxi- 
mately 4-fold decrease in the HKMT activity, indicating 
an important, but not critical, role of the TPR domain 
(Figure 2C). Furthermore, mutation of either Asp241 or 



Asp332 to Ala indeed had a negative impact with 60-75% 
reduction of the HKMT activity of Smyd3, and intri- 
guingly, double mutation of Asp241 and Asp332 almost 
completely abrogated the enzymatic activity (Figure 2C). 
Mutation of Glul92 to Ala had a minor effect, implying 
that Glul92 might not be directly involved in histone 
binding (Figure 2C). Taken together, the structural and 
biochemical data demonstrate that the TPR domain and 
the acidic property of the opening of the substrate binding 
pocket are important for the HKMT activity of Smyd3. In 
particular, residues Asp241 of the SET domain and 
Asp332 of the TPR domain play key roles in the 
HKMT reaction, most likely through the binding of the 
histone substrate. 

Cofactor binding pocket 

At the active site of the Smyd3-AdoHcy structure, AdoHcy 
is bound in a pocket surrounded by the pi-p2 loop, the 
ri l-ri2 loop and (39 of the SET domain and the a6-a7 loop 
of the post-SET domain (Figure IB) with well-defined 
electron density (Figure 3A). Similar to AdoHcy bound 
in the SET7/9 structures (18,25), the cofactor takes a 
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Figure 3. Cofactor binding pocket. (A) A representative simulated-annealing omit F^-F^ electron density map (contoured at 2.0 a level) for the 
cofactor product AdoHcy. (B) A detailed comparison of the cofactor binding mode of Smyd3 (left panel) with that of SET7/9 (right panel, PDB code 
109S). The hydrogen bonds are indicated with black dotted lines. The residues contributing to the cofactor binding with their side chains are labeled 
in orange, and those with their backbones in black. Depending on the contributing moieties, the side chains or backbones of the residues involved in 
the AdoHcy binding are shown with ball-and-stick models. The color coding is the same as in Figure lA. (C) HKMT activity assays of the wild-type 
Smyd3 and the mutants carrying point mutations at the cofactor binding pocket. 



U-shape conformation in the Smyd3 structure (Figure 3B). 
The adenine ring of AdoHcy makes a n-n stacking inter- 
action with the side chain of Phe259, and the N6 atom of 
the adenine forms a hydrogen bond with the main-chain 
carbonyl of His206. For stabilization of the ribose ring, the 
02' atom forms a hydrogen bond with the side-chain amino 
group of Asnl32, and the 03' atom forms hydrogen bonds 
with the side-chain carbonyl of Asnl32 and the main-chain 
carbonyl of Tyr257. The amide group of the homocysteine 
is hydrogen-bonded to the main-chain carbonyl groups of 
Argl4 and Asnl6 and the side-chain carbonyl of Asn205. 
The carboxyl group forms hydrogen bonds with the main- 
chain amide of Asnl6 and the phenolic hydroxyl of Tyrl24. 

Besides the residues that interact directly with AdoHcy, 
residue Asp262 of the post-SET domain might contribute 
to the cofactor binding as it forms a salt bridge with Argl4 
to stabilize the positions of Argl4 and the pi-P2 loop 
where Argl4 is located, and perhaps interacts with the 
Nl and N6 atoms of the adenine moiety via a water 
molecule not seen at this resolution of diffraction data 
(Figure 3B). In addition, similar to its equivalent 
(Tyr335) in SET7/9 (25), Tyr239 may also make van der 
Waals interactions with the 04' atom of the ribose ring 
(Figure 4A). 

Although in Smyd3 the cofactor binds at an equivalent 
position and assumes a similar conformation as in SET7/9 
(18,25), there are notable differences in the cofactor 
binding mode between Smyd3 and SET7/9. As shown in 
Figure 3B, in the SET7/9-AdoHcy structure, the adenine 
ring of AdoHcy makes a n-n stacking interaction with 



Trp352, and the Nl atom forms a hydrogen bond with 
Glu356 of the post-SET domain. The carboxylate of the 
homocysteine moiety is not stabilized by the equivalent 
residue of Tyrl24, instead by the main-chain carbonyl 
and amide of Glu228. In addition, the ribose moiety of 
SET7/9 does not make interactions with the surrounding 
residues. 

Validation of the functional roles of these residues with 
mutagenesis analyses has not been performed previously 
except for the equivalents of Asn205 (21) and Tyr239 (20- 
22,27). Since Tyr239 has other important functions 
besides the cofactor binding, the results of the mutagenesis 
analyses of Tyr239 will be discussed later. As shown in 
Figure 3C, mutations of Tyrl24, Asnl32, Phe259 and 
Asp262 to Ala substantially impaired the HKMT 
activity of Smyd3, indicating the importance of their func- 
tional roles. In particular, mutation of Phe259 which inter- 
acts with the adenine ring and mutation of Asnl32 which 
is hydrogen-bonded to both 02' and 03' of the ribose 
almost completely abolished the enzymatic activity of 
Smyd3. Since Phe259 belongs to the post-SET domain, 
the results are consistent with a critical role of the 
post-SET domain in the cofactor binding and the 
HKMT activity of Smyd3. 

Lysine channel and implications on the catalytic 
mechanism 

In the SET7/9 structure, a lysine channel connecting the 
histone peptide binding site and the cofactor binding site is 
found to accommodate the side chain of the methylated 
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Figure 4. Structure of the lysine channel. (A) Comparison of the lysine 
channels of Smyd3 and SET7/9 (PDB code 109S). Superposition of the 
active sites of Smyd3 and SET7/9 reveals a narrow channel in Smyd3 
which is similar to the lysine channel of SET7/9. The key residues 
forming the lysine channel are shown with ball-and-stick models. The 
color coding is the same as in Figure lA. (B) Superposition of the 
active sites of Smyd3 and pea Rubisco LSMT (PDB code IPOY). 
The residues are shown with ball-and-stick models, and for clarity, 
the side chains of AsnlSl of Smyd3 and Arg222 of Rubisco LSMT 
are hidden. The color coding for Smyd3 is the same as in Figure lA, 
and Rubisco LSMT is colored in light gray. (C) HKMT activity of the 
wild-type Smyd3 and the mutants carrying point mutations at the lysine 
channel. 



H3K4 (25). In the Smyd3 structure, a narrow channel is 
also found at a similar site and thus is a potential binding 
site of the side chain of H3K4 (Figure 4A). However, a 
detailed comparison of the two enzymes reveals significant 
differences at this site: the lysine channel of SET7/9 is 
formed mainly by residues Leu267, Tyr245, Tyr305, 
Tyr335 and Tyr337, while the lysine channel of Smyd3 
contains only two Tyr residues (Tyr257 and Tyr239) 
(Figure 4A). In Smyd3, the positions where Leu267 and 



Tyr305 of SET7/9 are located are occupied by Phel83 and 
Ile214, respectively, and those where the phenol rings of 
Tyr337 and Tyr245 reside are vacant. In addition, 
although the position of the hydroxyl of Tyr337 is 
occupied by the hydroxyl of Tyr257, the side chain of 
Tyr257 approaches this site from an opposite side and 
correspondingly the hydroxyl group points to a different 
direction. Surprisingly, further comparison of the struc- 
ture of the active site of SiTtyd3 with those of other 
lysine methyltransferases shows that the lysine channel 
of Smyd3 most resembles that of a non-HKMT, namely 
pea Rubisco large subunit methyltransferase (LSMT) (20) 
which, hke Smyd3, is able to catalyze tri-methylation of 
lysine (Figure 4B). The base of the lysine channel should 
be the binding site for the methyl groups of the lysine 
substrate. In Smyd3, the base of the lysine channel 
which is surrounded by Phel83, AsnlSl and Tyr239 is 
shghtly larger than that in Rubisco LSMT. Further exam- 
ination and comparison of the SiTiyd3 structure with that 
of the Y245A mutant of SE7/9 in complex with a 
tri-iTtethylated peptide of TAFIO (PDB code 3M5A) 
clearly show that the base of the lysine channel in 
Smyd3 is sufficient to accommodate three methyl groups 
(Supplementary Figure S4), which is required by its 
activity to tri-methylate the substrate. 

In all of the three structures, the Tyr residues (Tyr239 of 
Smyd3, Tyr335 of SET7/9 and Tyr287 of Rubisco LSMT) 
which correspond to the absolutely conserved Tyr residue 
in all of the identified SET enzymes, adopt similar pos- 
itions and configurations (Figure 4A and B). During the 
methyl transfer reaction, the methyl acceptor Lys needs to 
be deprotonated, and it has been proposed that Tyr 
residues at the active site of methyltransferases are respon- 
sible for deprotonating the Ne group of the Lys substrate 
(18,20,22,23,25). Specifically, Trievel et al. originally 
proposed that in pea Rubisco LSMT, the methyl 
transfer is catalyzed by Tyr287 (20). However, later this 
notion was put in doubt by the same group as in their 
ternary structure of Rubisco LSMT in complex with a 
free s-N-irtethyllysine and AdoHcy, the Ne group of Lys 
forms a hydrogen bond with the main-chain carbonyl of 
Arg222 and is more than 3.3 A away from the Tyr287 
hydroxyl (44). In addition, a study of a viral histone 
H3K27 iTiethyltransferase argued that the corresponding 
Tyr of this enzyme is unhkely to act as a general base as 
the Tyr to Phe irtutant exhibited httle activity at pH 8.0 or 
above pH 9.0 (27). However, recently a deprotonation role 
of SET7/9-Tyr335 is again supported by simulation 
studies of SET7/9 which indicate that prior to the 
binding of the cofactor, Tyr335 can be deprotonated by 
bulk water molecules and then acts as a general base to 
deprotonate the Lys amine group (45). Thus, the mechan- 
ism to achieve deprotonation of the Ns group of the Lys 
substrate is still inconclusive and deserves more 
investigation. 

Although the active site of Smyd3 is similar to that of 
Rubisco LSMT, there are notable differences (Figure 4B). 
In particular, Asnl81 of Smyd3, equivalent to Arg222 of 
Rubisco LSMT, is unhkely to interact with the Lys sub- 
strate as the main-chain carbonyl of Asnl81 points away 
from the lysine channel and this conformation of Asnl81 
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is stabilized by tlie surrounding residues. We examined the 
effect of mutations of tlie residues involved in the forma- 
tion of the lysine channel on the HKMT activity of 
Smyd3, including Phel83, the only two Tyr residues 
(Tyr239 and Tyr257), and Ser202. Mutation of Phel83 
to Ala completely abrogated the HKMT activity of 
Smyd3, exhibiting a critical role of Phel83 in the mainten- 
ance of the integrity of the lysine channel and in the inter- 
action of Smyd3 with the Lys substrate (Figure 4C). The 
S202A and Y257F mutants retain more than half of the 
HKMT activity, indicating that they play less important 
roles in the catalysis (Figure 4C). In contrast, mutation of 
Tyr239 to Phe or Ala completely abohshed the enzymatic 
activity, demonstrating a key role of this residue in the 
HKMT reaction (Figure 4C). 

Mutation of Tyr239 may affect the HKMT activity of 
Smyd3 in three ways: change of the cofactor binding 
affinity, disruption of the lower part of the lysine 
channel and loss of the ability to accelerate deprotonation 
of the Ne group of the Lys substrate. The possibihty of the 
involvement of Tyr239 in the cofactor binding is inferred 
from the observation that its equivalent of SET7/9 
(Tyr335) makes van der Waals contacts with the ribose 
ring of AdoHcy (18). However, in Smyd3, the effect of 
the Tyr239 mutation on the cofactor binding should be 
trivial as the ribose ring of AdoHcy is stabilized by three 
hydrogen bonds with Asnl32 and Tyr257 (Figure 3B) and 
the potential interactions of Tyr239 with the cofactor 
might be expected to contribute little to the cofactor 
binding. It is also noteworthy that the hydroxyl group 
of Tyr239 forms a hydrogen bond with the main-chain 
carbonyl of Leu204 of the ri4-p9 loop which might stabil- 
ize the position of the side chain of Tyr239, and hence 
mutation of Tyr239 to Ala or Phe might lead to disruption 
of the lysine channel due to the loss of the large side chain 
in the Y239A mutant or the instability of the phenol ring 
in the Y239F mutant. Therefore, we further mutated 
Tyr239 to Lys or a neutral residue Gin both of which 
might retain the interaction with the carbonyl of Leu204 
but are unable to deprotonate a Lys amine. Again, both 
the Y239K and Y239Q mutations led to complete loss of 
the enzymatic activity of Smyd3 (Figure 4C). However, 
the intactness of the lysine channel in these mutants 
needs further investigation. Taken together, the structural 
analyses and biochemical assays indicate that the hydroxyl 
group of Tyr239 is critical for the HKMT activity of 
Smyd3. 

Modulation of the HKMT activity via DNA binding of 
the MYND domain 

The MYND domain was originally named due to its 
presence in ETO (also named myeloid tumor gene 8) 
and Drosophila proteins Nervy and Deaf 1 (46), and later 
has been found in numerous other proteins including the 
Smyd proteins (47), BS69 (48), mammalian programmed 
cell death proteins 2 (49), and AMLl/ETO which is a 
fused protein resulted from the t(8;21) translocation in 
acute myeloid leukemia. Comparison of the structure of 
the MYND domain of Smyd3 with other reported 
MYND structures shows that it most resembles the 



MYND domain of AMLl/ETO (50) with an RMSD of 
0.57 A. Despite a very similar overall structure, there are 
differences in certain regions. For example, a loop encom- 
passing residues 55-59 in Smyd3 takes a conformation 
different from that of the equivalent loop in AMLl/ 
ETO. In addition, residues 86-93 of Smyd3 form helix 
a2 while the equivalent residues form a loop in AMLl/ 
ETO. 

It has been indicated that the MYND domain of 
AMLl/ETO binds the PPPLl motif of retinoid and 
thyroid hormone receptor SMRT corepressor (50). In 
the structure of the MYND domain of AMLl/ETO 
which is fused to the PPPLl motif of SMRT, the conform- 
ation of the MYND domain is similar to that of the 
unbound form, and the three residues Ser675, Ghi688 
and Trp692 of the MYND domain located in a hydropho- 
bic pocket interact with the PPPLl motif of the SMRT 
peptide (50). In the MYND domain of Smyd3, the equiva- 
lent residues (Ser63, Glu76 and Trp80, respectively) are 
conserved. However, the electrostatic properties of the 
residues surrounding the hydrophobic pocket exhibit sub- 
stantial differences, with the replacement of acidic residues 
(Glu672 and Glu692) of AMLl/ETO with hydrophobic 
residues (Met60 and Pro81) and substitution of residues 
Thr673 and His689 with highly basic residues (Arg61 and 
Lys77) (Figure 5A). With the presence of additional Lys 
and Arg residues, the surface of the MYND domain in 
Smyd3 is largely positively charged (Figure 5A), which is 
in agreement with its potential role in the binding of 
specific DNA sequences such as 5'-CCCTCC-3' and 
further in transcriptional regulation of the targets 
including Nkx2.8 (6). 

To examine the hypothesis that the MYND domain of 
Smyd3 may directly bind the DNA element, two Smyd3 
mutants were generated: the R66E mutant in which an 
acidic residue is placed on the positively charged 
MYND surface and a negative control mutant El 92 A 
which has an intact MYND domain. The wild-type and 
mutant Smyd3 proteins were incubated with the potential 
target of 6-bp duplex DNA with the sequence of 5'-CCCT 
CC-3'. As shown in Figure 5B, the DNA exhibited a 
slower migration rate in the presence of the wild-type 
Smyd3 or the El 92 mutant but not the R66E mutant. It 
was reported that pre-incubation of yeast methyl- 
transferase Dotlp with DNA could stimulate its HKMT 
activity on histones (51); therefore, we further examined 
the possibihty that the DNA-binding ability of the 
MYND domain might affect the HKMT activity of 
Smyd3. In the presence of the potential target DNA, the 
HKMT activity of the wild-type Smyd3 was indeed 
increased in a dose-dependent manner (Figure 5C). The 
enhancement of the HKMT activity by DNA, however, 
did not occur when Arg66 is mutated to Glu, indicating 
that the DNA-binding ability of the MYND domain is 
essential for stimulating the enzymatic activity of Smyd3 
by DNA. Taken together, our data demonstrate for the 
first time that the DNA binding of Smyd3 stimulates its 
HKMT activity, and the MYND domain may mediate the 
process through direct binding with the target DNA. 
Intriguingly, comparison of the MYND domains of the 
Smyd proteins shows that some of the basic residues of 
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Figure 5. Modulation of the HKMT activity of Smyd3 via DNA binding of the MYND Domain. (A) Electrostatic potential surface of the MYND 
domain with the negative charge in red and the positive charge in blue. All the positively charged residues are labeled. (B) Analysis of the interaction 
between the wild-type and mutant Smyd3 proteins and the potential target DNA with gel shift assays. The wild-type (WT) Smyd3 and the R66E and 
E192A mutants were loaded to 1 ng DNA with different molar ratios as indicated. DNA binding of the wild-type and mutant Smyd3 were analyzed 
with agarose gel electrophoresis. (C) HKMT assays of the wild-type and R66E mutant Smyd3 in the presence and absence of the potential target 
DNA. The HKMT activity of the wild-type but not the mutant Smyd3 is stimulated when the potential target DNA is supplemented in the HKMT 
reaction system. (D) Sequence alignment of the MYND domain among all human Smyd proteins. The sequence number and secondary structure 
elements of Smyd3 are marked. The invariant residues across these proteins are denoted with filled red boxes, and the highly conserved ones in open 
boxes. The residues conserved in Smyd 1-4 but not SmydS are denoted with asterisks. 



Smyd3 (Arg61, Arg66, Lys77 and Lys84) are conserved in 
Smydl and Smyd2 (Figure 5D), implying that Smydl and 
Smyd2 may utilize a similar mechanisin to regulate their 
HKMT activities. 

PROTEIN DATA BANK ACCESSION CODES 

The structures of SmydS in crystal forms I, II and III have 
been deposited with the RCSB Protein Data Bank under 
accession codes 30XF, 30XL and 30XG, respectively. 

SUPPLEMENTARY DATA 

Supplementary Data are available at NAR Online. 
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